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Perturbation of linear forms of singular 
vectors under Gaussian noise 

Vladimir Koltchinskii and Dong Xia 


Abstract. Let A € be a matrix of rank r with singular value de¬ 
composition (SVD) A = ®Vk), where {<Jh,k = 

are singular values of A (arranged in a non-increasing order) and Uk £ 
R"*,Ufc £ R", k = 1,... ,r are the corresponding left and right orthonor¬ 
mal singular vectors. Let A = A -|- X be a noisy observation of A, 
where X £ random matrix with i.i.d. Gaussian entries, 

Xij ^ A/’(0, r^), and consider its SVD A = ^k{uk®Vk) with singu¬ 

lar values CTi > ... > Gmhn and singular vectors Uk,Vk,k = 1,..., m An. 

The goal of this paper is to develop sharp concentration bounds 
for linear forms {uk,x),x £ R™' and {vk,y),y £ R" of the perturbed 
(empirical) singular vectors in the case when the singular values of A 
are distinct and, more generally, concentration bounds for bilinear forms 
of projection operators associated with SVD. In particular, the results 

imply upper bounds of the order O ^^ (holding with a high 
probability) on 

max \(uk — \/l + bkUk,eT)\ and max \{vk — \/1 + bkVk, e?)|, 

l<i<m ' ' ' ' l<j<n ' ' ' ' 

where bk are properly chosen constants characterizing the bias of em¬ 
pirical singular vectors Uk,Vk and {eY^,i = 1,..., m}, {e™, j = !,...,«} 
are the canonical bases of R^'jR", respectively. 
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1. Introduction and main results 

Analysis of perturbations of singular vectors of matrices under a random noise 
is of importance in a variety of areas including, for instance, digital signal 
processing, numerical linear algebra and spectral based methods of commu¬ 
nity detection in large networks (see 0, m, 0, i, m, m, m, m and 
references therein). Recently, random perturbations of singular vectors have 
been studied in Vu m, Wang m, O’Rourke et al. [ 3 ], Benaych-Georges and 
Nadakuditi [T]. However, up to our best knowledge, this paper proposes hrst 
sharp results concerning concentration of the components of singular vectors 
of randomly perturbed matrices. At the same time, there has been interest in 
the recent literature in so called “delocalization” properties of eigenvectors 
of random matrices, see Vershynin Vu and Wang m and references 
therein. In this case, the “information matrix” A is equal to zero, A = X 
and, under certain regularity conditions, it is proved that the magnitudes of 
the components for the eigenvectors of X (in the case of symmetric square 
matrix) are of the order O () with a high probability. This is somewhat 
similar to the results on “componentwise concentration” of singular vectors 
of A = A + X proved in this paper, but the analysis in the case when A ^ 0 
is quite different (it relies on perturbation theory and on the condition that 
the gaps between the singular values are sufficiently large). 

Later in this section, we provide a formal description of the problem 
studied in the current paper. Before this, we introduce the notations that will 
be used throughout the paper. For nonnegative ATi, K2, the notation Ki < K2 
(equivalently, K2 > Ki) means that there exists an absolute constant C > 0 
such that Ki < CK2] ATi x K2 is equivalent to Ki < K2 and K2 ^ ATi 
simultaneously. In the case when the constant C might depend on 7, we 
provide these symbols with subscript 7 : say, ATi K2 ■ There will be many 
constants involved in the arguments that may evolve from line to line. 

In what follows, (•,•) denotes the inner product of finite-dimensional 
Euclidean spaces. For N > I, e^,j = 1 ,... ,N denotes the canonical basis 
of the space If P is the orthogonal projector onto a subspace L C R^, 
then P-*- denotes the projector onto the orthogonal complement L^. With a 
minor abuse of notation, || • || denotes both the Z2-norm of vectors in finite- 
dimensional spaces and the operator norm of matrices (i.e., their largest sin¬ 
gular value). The Hilbert-Schmidt norm of matrices is denoted by || • ||2. 
Finally, || • ||oo is adopted for the Zoo-norm of vectors. 


In what follows, A' G R"’^™ denotes the transpose of a matrix A G 
Rmxn^ The following mapping A : R™’'’" ]^(m-Hn)x(m+n) frequently 


used: 



Note that the image A (A) is a symmetric (m -I- n) x (to -|- n) matrix. 

Vectors u G R"*,u G R", etc. will be viewed as column vectors (or 
TO X I, n X I, etc matrices). For u G R”’', v G R", denote by w G) n the matrix 
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uv' S In other words, u®v can be viewed as a linear transformation 

from R” into R™ defined as follows: {u 0 v)x = u{v, x),x G R". 

Let A G R™^" be an TO X n matrix and let 

mAn 

^ ^ Ui{Ui 0 Vi) 

i=l 

be its singular value decomposition (SVD) with singular values cti > ... > 
CmAn > 0, orthonormal left singular vectors ui,... ,UmAn G K™ and or¬ 
thonormal right singular vectors vi,, VmAn G R". If A is of rank rank(A) = 
r < mAn, then ai = 0 ,i > r and the SVD can be written as A = 
Si=i 0 Ui). Note that in the case when there are repeated singular val¬ 
ues Ui, the singular vectors are not unique. In this case, let p-i > ... > 0 

with d < r he distinct singular values of A arranged in decreasing order 
and denote := {i : ai = ^k},k = l,...,d. Let nk ■= card(Afe) be the 
multiplicity of /ife, fc = 1 ,... ,d. Denote 

pr ■■= Y. ® := E ® 

ieAk ieAk 

pr ■■= E ® pr ■■= E r ® 

ieAfe zGAfc 

It is straightforward to check that the following relationships hold: 

rpuuy puu ^puuyi puu pvu /puv\f puvpvu puu 

This implies, in particular, that the operators orthogonal pro¬ 

jectors (in the spaces R'",R", respectively). It is also easy to check that 

= 0 , p^pr = 0 , prpr = o, p^pr = o, k'. ( 1 . 2 ) 

The SVD of matrix A can be rewritten as A = J 2 k=i d'kPr 
be shown that the operators P^" A = I,..., d are uniquely defined. Let 


B = A{A) = 

( ° 

V A' 

0 )=!:«( 

/c=l 

' 0 

pvu 

Pr \ 

0 )■ 

1 ,... ,d, denote 





^ / puu 

pr 

DUV 

^k 

-pvv 

^k 


t:}uu 
^ k 
t:>vu 
k 

-pr \ 

pvv ) 


and also 


l^—k ■— MA;- 

Using relationships (dl), (HH), it is easy to show that PtPk' = Pk'Pk = 
l{k = k')Pk for all k,k',l < |A:| < d, 1 < \k'\ < d. Since the operators Pk : 
]^m+rt ^ ^ |^| ^ ^ Symmetric, they are orthogonal projec¬ 

tors onto mutually orthogonal subspaces of R^^". Note that, by a simple al¬ 
gebra, B = X]i<|fc|<d d-kPk, implying that ^k are distinct eigenvalues of B and 

Pk are the corresponding eigenprojectors. Note also that if 2 J 2 k=i ^k < m+n, 
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then zero is also an eigenvalue of B (that will be denoted by /tq) of multiplic¬ 
ity t'o := n -I- m — 2 Representation B = A{A) = 

will play a crucial role in what follows since it allows to reduce the analysis 
of SVD for matrix A to the spectral representation B = X]i<|fc|<d 

particular, the operators P™ involved in the SVD A = ^^kPk'’ can be 

recovered from the eigenprojectors Pk of matrix B (hence, they are uniquely 

defined). Define also ) and 0_i ■= :^(^ ) for f = 1,..., r 

and let A_fc := {—i : i G A^}, k = 1,... ,d. Then, 9i, 1 < |f | < r are orthonor¬ 
mal eigenvectors of B (not necessarily uniquely defined) corresponding to its 
non-zero eigenvalues cti > • • • > > 0 > a-r > • • • > cr_i with a-i = —Ui 

and 

Pfc = ^ {9i ® 9i), 1 < |A:| < d. 

iGAfe 

It will be assumed in what follows that A is perturbed by a random 
matrix X G with i.i.d. entries ~ A/'(0 ,t^) for some r > 0. Given 

the SVD of the perturbed matrix 


/ 0 kl X 
V A' 0 J 


mAn 

A = A + X = ^ di{ui G Vi), 
i=i 

our main interest lies in estimating singular vectors Ui and Vi of the matrix 
A in the case when its singular values ai are distinct, or, more generally, 
in estimating the operators P^“, P™, P™, P^*'. To this end, we will use the 
estimators 

pr ■■= E ® E ® 

iGAfc iGAfc 

Pr ■■= E ® pr ■■= E (p ® 

iGAfc iGAfc 


and our main goal will be to study the fluctuations of the bilinear forms 
of these random operators around the bilinear forms of operators P™,P™, 
P^", P^*'. In the case when the singular values of A are distinct, this would 
allow us to study the fluctuations of linear forms of singular vectors Ui,Vi 
around the corresponding linear forms of Ui , Vi which would provide a way 
to control the fluctuations of components of “empirical” singular vectors in 
a given basis around their true counterparts. Clearly, the problem can be 
and will be reduced to the analysis of spectral representation of a symmetric 
random matrix 

P = A(i) = (j, ^)=P + r, where r = A(A) = ( ^, ^ ), (1.3) 
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that can be viewed as a random perturbation of the symmetric matrix B. 
The spectral representation of this matrix can be written in the form 

B= ^ dii9i®9i), 

l<\i\<{m/\n) 


where 

(J—i — Oi 



1 e "i 

\ Vi ) 

’ y2 V-dJ 


= 1,..., (to A n). 


If the operator norm ||r|| of the “noise” matrix P is small enough comparing 
with the “spectral gap” of the fc-th eigenvalue of B (for some k = 1,..., d), 
then it is easy to see that Pk ■= ® orthogonal projector 

on the direct sum of eigenspaces of B corresponding to the “cluster” {di : 
i £ Afe} of its eigenvalues localized in a neighborhood of ^k- Moreover, Pk = 


K 


T)UU 

\k 

f)VU 


ryuv 

‘pvv 

^k 


^. Thus, it is enough to study the fluctuations of bilinear 


forms of random orthogonal projectors Pk around the corresponding bilinear 
form of the spectral projectors Pk to derive similar properties of operators 

T>UU f)UV TZtVU TDVV 

■^k ■>-^k ^^k ^^k ’ 

We will be interested in bounding the bilinear forms of operators Pk — Pk 
for fc = 1,..., d. To this end, we will provide separate bounds on the random 
error Pk — EPfc and on the bias Ei\ — Pk- For k = 1,... ,d, gk denotes the 
distance from the eigenvalue gk to the rest of the spectrum of A (the eigengap 
of gk)- More specihcally, for 2 < fc < d — 1, = min(^fc — gk+i, gk-i — gk), 

§1 = gi- g 2 and ga = min(^d-i - gd, gd)- 

The main assumption in the results that follow is that E|| A|| < ^ (more 
precisely, E||X|| < (1 — 7 )^ for a positive 7 ). In view of the concentration 


inequality of Lemma 12.11 in the next section, this essentially means that the 
operator norm of the random perturbation matrix ||r|| = ||A|| is strictly 
smaller than one half of the spectral gap gk of singular value gk. Since, 
again by Lemma [2.11 E||A|| x Ty/m V n, this assumption also means that 
gk ^ Ty/mVn (so, the spectral gap gk is sufficiently large). Our goal is to 
prove that, under this assumption, the values of bilinear form {PkX, y) of 
random spectral projector Pk have tight concentration around their means 
(with the magnitude of deviations of the order \p^^)- We will also show 
that the bias E/jt — Pk of the spectral projector Pk is “aligned” with the 
spectral projector Pk (up to an error of the order in the operator 

norm). More precisely, the following results hold. 


Theorem 1.1. Suppose that for some 7 £ (0,1), E||A|| < (1 — 7 )^. There 
exists a constant D.y > 0 such that, for all x,y € ]R™+" and for all t > 1, the 
following inequality holds with probability at least 1 — : 


1 /,^ , \i „ rvt / T-J TO V n + TVt 

|((Pfc - EPk)x,y)\ < D.y^—( -=-h 1 


9k V 


gk 


(1.4) 
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Assuming that t < m y n and taking into account that T^/m V n 
]E||A|| < gk, we easily get from the bound of Theorem II.II that 


\{{Pk-^Pk)x,y)\ 

9k 


<„ 


t 


mV n 


SO, the fluctuations of (i^x, y) around its expectation are indeed of the order 


1 

mVn ’ 


The next result shows that the bias — Pk of can be represented 
as a sum of a “low rank part” Pki^Pk — Pk)Pk and a small remainder. 


Theorem 1.2. 


The following bound holds with some constant D > 0 : 

T^(m V n) 


EP/c — Pk 


< D- 


9l 


(1.5) 


Moreover, suppose that for some 7 € (0,1), E||A|| < (1 — 7 )^. Then, there 
exists a constant > 0 such that 


EPk -Pk- PkiEPk - Pk)Pk\\ < 

9k 


Since, under the assumption E||A|| < (1 — 7 )'^, we have gk P r^mV n, 
bound (fOl) implies that the following representation holds 

EPfe — Pk = Pk{EPk — Pk)Pk + Tk 


with the remainder Tk satisfying the bound 


IXfcll <7 


r'^y/mV n ^ Vk 
gl y/mVn 


We will now consider a special case when fXk has multiplicity 1 {vk = 1). 
In this case, = {ik} for some 4 G {1,..., (to A n)} and Pk = Let 

Pk '■= 6ik ® ^ik - Note that on the event ||r|| = ||X|| < ^ that is assumed to 
hold with a high probability, the multiplicity of di^. is also 1 (see the discussion 
in the next section after Lemma Note also that the unit eigenvectors 
9i ^., Oif. are defined only up to their signs. Due to this, we will assume without 
loss of generality that (^ 4 ,^ 4 ) > 0. 

Since Pk = 9i^. (E> Oi^. is an operator of rank 1, we have 

P/c(EPfe — Pk)Pk = bkPk, 


where 

bk := {{EPk - Pfc ) 0 .,, 0 .,) = E ( 4 , 0,,)2 _ 


Therefore, 

EPk = (1 + bk)Pk + Tk 


and bk turns out to be the main parameter characterizing the bias of Pk- 
Clearly, bk G [—1)0] (note that 6 ^ = 0 is equivalent to = 6*4 a.s. and 
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bk = —1 is equivalent to 
of Theorem O 


\bk\ < 


_L 6*4 a.s.). On the other hand, by bound (USD 


lEPfe — Pk 


< 


r'^irn V n) 


9l 


(1.7) 


In the next theorem, it will be assumed that the bias is not too large in the 
sense that bk is bounded away by a constant 7 > 0 from — 1 . 


Theorem 1.3. Suppose that, for some 7 S (0,1), E|l^|| < (1 — 7 )^ and 
1 + > 7 - Then, for all x S and for all t > 1 with probability at least 

1 -e-S 

\/n n - \ \ ^ T-Jt fTy/rnVn + T\/t ,, 

+ —(-r-+ Ijlkll. 

9k ^ 9k ' 

Assuming that t mW n, the bound of Theorem If.31 implies that 

\{0^^ - Vl + 5fc6»4,a;)| ^||a;|| J ||a;||. 

'' '' gk \ mW n 

Therefore, the fluctuations of {di^,x) around \/l + bk{9i,^,x) are of the order 

y mWn ’ 

Recall that 0^^ where Uif,,Vi^ are left and right singular 

vectors of A corresponding to its singular value pk- Theorem ll.dl easilv implies 
the following corollary. 


Corollary 1.4. Under the conditions of Theorem 1 1. ,71 with probability at least 

771+n ■ 

max{||< - ^l + bku,,\\^, ||{i,, - 

For the proof, it is enough to take t = 21og(m + n), x = e^~^"',i = 
1,..., (to + n) and to use the bound of Theorem 11.31 along with the union 
bound. Then recalling that = 775(^4 Theorem 11.31 easily implies 

the claim. 

Theorem fOl shows that the “naive estimator” {Oi^, , x) of linear form 
{^ik ) x) could be improved by reducing its bias that, in principle, could be 
done by its simple rescaling {0i^,x) !->■ ((1 + bk)~^^‘^9i^,x). Of course, the 
difficulty with this approach is related to the fact that the bias parameter 
bk is unknown. We will outline below a simple approach based on repeated 
observations of matrix A. More specifically, let A^ = A+X^ and = A+X'^ 
be two independent copies of A and denote — A(A^), = A(A^). Let 9}^ 

and 9^^ be the eigenvectors of B^ and B^ corresponding to their eigenvalues 
The signs of 9}^ and 9f^ are chosen so that (^* 7 ,^ 4 ) > 0. Let 

bk -.= {91,-91)-1. 


( 1 . 8 ) 
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Given 7 > 0, define 


Corollary 1.5. Under the assumptions of Theorem \l.S[ there exists a constant 
Dj > 0 such that for all x S and all t > 1 with probability at least 

l-e-\ 


ffii) _ 

■ \/r+&^v^’ 


\bk - bk\ < D^^— 


T\/t r T^/m V n + r-^/i 


gk L 


gk 


+ 1 


and 


\{etl^-6^,,x)\<D. 


Ty/t r T\/m V n + ry/i 


9k 


9k 


+ 1 


INI. 


(1.9) 

( 1 . 10 ) 


Note that is not necessarily a unit vector. However, its linear form 
provides a better approximation of the linear forms of 0 ^ than in the case 
of vector 9}^ that is properly normalized. Clearly, the result implies similar 
bounds for the singular vectors and 

^ 1‘k I'k 


2. Proofs of the main results 

The proofs follow the approach of Koltchinskii and Lounici [ 6 ] who did a 
similar analysis in the problem of estimation of spectral projectors of sample 
covariance. We start with discussing several preliminary facts used in what 
follows. Lemma o and Lemma below provide moment bounds and a 
concentration inequality for ||r|| = ||X||. The bound on ]E||X|| of Lemma l2.ll 
is available in many references (see, e.g., Vershynin m)- The concentration 
bound for ||X|| is a straightforward consequence of the Gaussian concen¬ 
tration inequality. The moment bounds of Lemma 12.21 can be easily proved 
by integrating out the tails of the exponential bound that follows from the 
concentration inequality of Lemma 12.11 

Lemma 2.1. There exist absolute constants co,ci,C 2 > 0 such that 
CQTy/m V n < E||X|| < c\t\/ m\/ n 

and for all t > 0, 

P{|||X||-E||X||| >C2rNt} <e-‘. 

Lemma 2.2. For all p > 1, it holds that 

Ei/P||Xfx r\/ mV n 

According to a well-known result that goes back to Weyl, for symmetric 
(or Hermitian) N x N matrices C, D 

Xj{C) - Aj(l?) 


max 


< IN-^ll, 
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where A'^(C'), X^{D) denote the vectors consisting of the eigenvalues of matri¬ 
ces C, D, respectively, arranged in a non-increasing order. This immediately 
implies that, for all fc = 1 ,..., d. 


max - Hk\ < ||r|| 


and 

min - /Tfel > gfc - ||r||. 

Assuming that ||r|| < we get that {aj : j G A^} C {^lk-gk/‘^,llk + gk/‘^) 
and the rest of the eigenvalues of B are outside of this interval. Moreover, if 
||r|| < then the cluster of eigenvalues {cFj : j G A^} is localized inside a 
shorter interval (/i/c — gk/gk + gk/^) of radius 5^/4 and its distance from 
the rest of the spectrum of i? is > jgk- These simple considerations allow us 
to view the projection operator Pk = X]jGAfc(% ® ^j) ^ projector on the 

direct sum of eigenspaces of B corresponding to its eigenvalues located in a 
“small” neighborhood of the eigenvalue g,k of B, which makes Pk a natural 
estimator of Pk- 

Define operators Ck as follows: 


Ck = Yl 

s^k 


1 


l^s l^k 


-p... 


In the case when 2 J2k=i < m + n and, hence, /tq = 0 is also an eigenvalue 
of i3, it will be assumed that the above sum includes s = 0 with Pq being the 
corresponding spectral projector. 

The next simple lemma can be found, for instance, in Koltchinskii and 
Lounici [^. Its proof is based on a standard perturbation analysis utilizing 
Riesz formula for spectral projectors. 


Lemma 2.3. The following bound holds: 

||Pfc-Pfc|| <4^. 

gk 

Moreover, 

Pk-Pk = Lk{T)pSk{T), 

where Lk{T) := CkTPk + PkTCk and 

ll^fe(r)|| < 14 . 

Proof of Theorem ll.il Since ELk{T) = 0, it is easy to check that 

Pk - EPk = LkiT) + SkiT) - ESkiT) =: Lk{r) + RkiT). (2.1) 

We will first provide a bound on the bilinear form of the remainder {^Rk(r)x, y). 
Note that 


{Rk{T)x,y) = {Sk{T)x,y) - {ESk{T)x,y) 



10 


Koltchinskii and Xia 


is a function of the random matrix X G since T = A{X) (see (11.31) '). 

When we need to emphasize this dependence, we will write Fx instead of F. 
With some abuse of notation, we will view X as a point in rather than 

a random variable. 

Let 0 < 7 < 1 and dehne a function hx,y,s{-) '■ —>• K as follows: 

hx,yAX) := 

where ^ is a Lipschitz function with constant ^ on K+ and 0 < (j){s) < 1. 
More precisely, assume that (j){s) = 1, s < 1, (j){s) = 0, s > (1 + 7 ) and </> is 
linear in between. We will prove that the function X 1 —>■ hx,y,s{X) satisfy the 
Lipschitz condition. Note that 


\{{Sk{Tx,) - Sk{Tx,))x,y)\ < ||5fe(FxJ - 5fe(FxJ||||x||||2/||. 

To control the norm ||S'fe(Fxi) — S'fe(Fx 2 )||) we need to apply Lemma 4 from 
[ 6 ]. It is stated below without the proof. 

Lemma 2.4. Let 7 G (0,1) and suppose that 6 < There exists a con¬ 

stant C-f > 0 such that, for all symmetric Fi,F 2 G ]^(m+n)x(m+n) gQ^Ugfy^jig 
the conditions ||Fi|| < (1 + 7 )i 5 and ||F 2 || < (l+ 7 )( 5 , 

||^fc(Fi)-^fc(F2)|| <c^4liri-r2||. 

9k 

We now derive the Lipschitz condition for the function X 1 — hx^y^siX). 


Lemma 2.5. Under the assumption that S < there exists a constant 

Ct, > 0 , ^ 


\hx,yAXi) - hx,yAX2)\ < c,&^2^ll^llllyll- 


9l 


( 2 . 2 ) 


Proof. Suppose first that max(||Fxi||, IlFxsIl) < (1 + 7 )i^- Using Lemma ITU 
and Lipschitz properties of function (j), we get 


\l^x,y,s(.Xi') hx,y,s{X2) \ — 


{Sk{Txi)x,y)(j) 


<||^,(FxJ-^fe(Fx2)|||k||||2/||^ 


lirxJI 

<5 

lirxJI 


+\\SkiTx, 

(5||Fxi - Fx, 




V s 


V s 


- {SkiTx2)x,y)(t) 


IklllMI 


lirx^ll 


<Cj 


9l 






9l 


7^ 


9k 9k 

In the case when min(||Fxi||, HFxaH) > (1 + l)d, we have hx,y,s{Xi) = 
hx,y,siX 2 ) = 0, and (12.21) trivially holds. Finally, in the case when ||Fxi|| < 
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(1 + 7)5 < llPxall) we have 

I ,i 5 (^ 1 ) hx^y^s{-^2')\ — 


{Sk{Txi)x,y)(j) 


lirxJl 


(5fe(rxJx,2/)</.( Eli") ?/)</.( 


<ll*5fc(rxj|| 




\ 9k J 10 

( 5 ||Xi - X2II2. 

^7-=2-Ikllllyll- 

9k 

The case HTxall < (1 + 7 )^ < l|rxi|| is similar. 


□ 


Our next step is to apply the following concentration bound that easily 
follows from the Gaussian isoperimetric inequality. 

Lemma 2.6. Let f : 1 —^ K 6 e a function satisfying the following Lipschitz 

condition with some constant L > 0 : 


|/(Ai) - /(yl2)| < L\\Ai - 7 I 2 II 2 , Ai,7l2 e 

Suppose X is a random mx n matrix with i.i.d. entries Xij ~ A/'(0,t^). Let 
M be a real number such that 

P{/(X) > M} > i and F{f{X) < M} > J. 

Then there exists some constant Di > 0 such that for all t > 1, 

P||/(X) - M\> DiLry/i^ < e"‘. 

The next lemma is the main ingredient in the proof of Theorem 11.11 
It provides a Bernstein type bound on the bilinear form {Rk(T)x^y) of the 
remainder Rk in the representation (EH). 

Lemma 2.7. Suppose that, for some 7 G (0,1), E||r|| < (1 — 7 )^- Then, there 
exists a constant > 0 such that for all x,y € R."*’'"" and all t > log(4), 
the following inequality holds with probability at least 1 — e“* 

|(i?/c(r)a;,2/)| < D^^—l -=- 

9k \ 9k 



Proof. Define 5n,m{t) '■= P||r|| +C 2 T-\/t. By the second bound of Lemma lOI 
with a proper choice of constant C 2 > 0, P{||r|| > 5n,m{t)} < e“‘. We first 
consider the case when C 2 T^/i < which implies that 


Sn,m(t) < (1 - 7/2)y = 


1 - y fffc 
1 + 7 ' 2 
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for some 7 ' G (0,1) depending only on 7 . Therefore, it enables us to use 

Lemma with 6 := 6n,m{t)- Recall that hx,y,s{X) = (S'fc(r)a;, 

and let M := Med[(Sk{r)x,y)). Observe that, for t > log(4), 

nh.,yAX) >M}> F{hx,y,5iX) > M, ||r|| < 6n,mit)} 

>¥{{Skir)x, y)>M}- P{||r|| > > 1 - e-* > i 

and, similarly. F{hx,y,s{X) < M) > Therefore, by applying lemmas [2.512.61 
we conclude that with probability at least 1 — e“*, 


\hx,y,s{X) - M\ S 


y/t I 


al 


Since, by the first bound of Lemma [2?n 6n,m{t) < T{y/rnVn + ^/i), we get 
that with the same probability 


\hx,y,s{X) — M\ 


< 


'''/i Ty/m V n + Ty/i 


1 - 
9k 


9k 


Moreover, on the event {||r|| < 6n,mit)} that holds with probability at least 
1 — e“‘, hx,y,s{X) = (Sk{T)x,y). Therefore, the following inequality holds 
with probability at least 1 — 2 e“* : 


\{Sk{T)x,y) - M\ 


< 


Ty/t Ty/m V n + Ty/t 


7 - 

9k 


9k 


(2.3) 


We still need to prove a similar inequality in the case C‘ 2 j\pt > 2 ^* 


case, 


E||r||<(l-7)f < 

2 7 


implying that 5 n,m{t) ^7 Ty/t. It follows from Lemma [2.31 that 


|(5fe(r)a:,y)| < \\S,{T)\\\\x 


< 


lirf 

9l 


lklll|j/ll 


This implies that with probability at least 1 — e 


|(5'/c(r)a;,y)| 


/t), 


9 l 


^7 72-If 
9k 


Since t > log(4) and e * < 1/4, we can bound the median M of (S'fc(r)a:, y) 
as follows: 

M<,^^\\x\\\\yl 

9k 

which immediately implies that bound (12.31) holds under assumption c^xy/t > 
^ ^ as well. By integrating out the tails of exponential bound (EH), we obtain 
that 


|E(5fc(r)x,y) - M| < E|(,Sfe(r)x,y) - M| 


<„ 


^y/mV ' 
9 l 
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which allows us to replace the median by the mean in concentration inequality 
(1^ . To complete the proof, it remains to rewrite the probability bound 
1 — 2e“‘ as 1 — by adjusting the value of the constant D^. □ 

Recalling that Pk — = Lfc(r) + i?/c(r), it remains to study the 

concentration of (Lfc(r)a;, j/). 

Lemma 2.8. For all x,y € and t > 0, 

Proof. Recall that Lk{T) = PkTCk + CkTPk implying that 
{Lk{T)x,y) = {rPkX,Cky) + {TCkX,Pky). 


li X = ( )) 2/ = ( )) where xi,yi G K™, X2,y2 G R", then it is easy to 

\ X2 / V 2/2 / 

check that 

{Tx,y) = {Xx 2 ,yi) + {Xy 2 ,xi). 

Clearly, the random variable (Pa:,//) is normal with mean zero and variance 

E(rx, y)^ < 2 E(Xx 2 ,yi)^ +E(Xy 2 ,xi)^ . 

Since X is an m x n matrix with i.i.d. A/'(0 ,t^) entries, we easily get that 
E{Xx2,yi)‘^ = E(X, yi O X2)^ = t^IIi/i ® a;2||2 = r‘^\\x2f\\yif 

and, similarly, 

E{Xy2,x^)^=T^x,r\\y2r. 

Therefore, 


E(rx,y)^ <2r2 Ik2f ||yif + ||xif ||y2||^ 


<2P 


(Ikif+ Ik2f)(||yif+ ||y2||^ 


= 2t^\\x 


2||„I|2||„,||2 


As a consequence, the random variable {Lk{F)x, y) is also normal with mean 
zero and its variance is bounded from above as follows: 

E(Lfe(r)x, y)2 <2[E(rPfeX, Cuy)'^ + E(rC'fcX, 


<4P 


WPkxfWCkyr + WCkxfWPkyf 


Since ||Pfe|| < 1 and HCfeU < i, we get that 

E(Pfe(r)x,y)2<^||xf||yf. 

9k 

The bound of the lemma easily follows from standard tail bounds for normal 
random variables. □ 


The upper bound on |((P/c — EPfe)x,y)| claimed in Theorem 11.11 follows 
by combining Lemma 12.71 and Lemma 12.81 □ 
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Proof of Theorem ll.2l Note that, since Pfc—Pfc = Lk{T)+Sk(X) andELfe(r) = 
0 , we have 

EPk -Pk= ESkiT). 

It follows from the bound on ||S'fc(r)|| of Lemma [2131 that 


E-Pfc — Pk 


EiiriP 

< E||^fc(r)|| < 14^^ 


9k 


(2.4) 


and the bound of Lemma 12.21 implies that 

t^(to V n) 


EPfe — Pk 


< 


9l 


which proves (dU. 

Let 

Sn,m ■= E||r|| + C2T-\/log(TO + n). 

It follows from Lemma o that, with a proper choice of constant C 2 > 0 , 


P(l|r|| < 


1 

m + n 


In the case when C 2 T^log{m + n) > 2^1 the proof of bound (dll) is trivial. 
Indeed, in this case 


EPfc — Pk 






9l 


9l 


Since 


Pk{EPk — Pk)Pk 


< 


EPk 


Pk 


, bound ITU) of the theorem follows 


when C 2 Ty^log(m + n) > 

In the rest of the proof, it will be assumed that C 2 TyJ\og{m + n) < 2 ^ 
which, together with the condition E||r|| = E||X|1 < (1 — 7 )^, implies that 
Sn,m < (1 — On the other hand, ^ Ty/mVn. The following 

decomposition of the bias EP^ — Pk is obvious: 


EPfe -Pk= ESk{T) = EPkSk{r)Pk 

+E {PjtSk(r)Pk + PkSkiT)p^ + p^Sk(r)p^) I(||r|| < 

+E (p,^5fc(r)Pfc + PkSk{T)Pit + p^Sk{r)Pjt) I(||r|| > 5„,„) 

(2.5) 


We start with bounding the part of the expectation in the right hand side 
of ([2.51) that corresponds to the event {||r|| < 6n,m} on which we also have 
||r|| < Under this assumption, the eigenvalues of B and aj{B),j S 
of B are inside the circle "fk in C with center and radius The rest of 


the eigenvalues of P, P are outside of 7 ^. According to the Riesz formula for 
spectral projectors. 




where Rt{ii) = {T — vl) ^ \ <^{T) denotes the resolvent of operator 

T (ct(T) being its spectrum). It is also assumed that the contour 7 ^ has 
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a counterclockwise orientation. Note that the resolvents will be viewed as 
operators from into itself. The following power series expansion is 

standard: 

R^iri) =i?B+r(b) = (S + r-ry/)-i 

r>0 


where the series in the last line converges because II i?B(r?)r|| < ||i?B(?7)|| ||r|| < 
= 1. The inequality ||i?B(? 7 )|| < ^ holds for all rj G jk- One can easily 
verify that 


Pk 


Tfc(r) 

Skir) 


= RB{n)dv, 


Ik 


1 

27rz 


7 (f RB{r])TRB{ri)dr], 

7 <f J2{-ir[RBivWRBiv)dv- 

* dlk r>2 


1 

27ri 


The following spectral representation of the resolvent will be used 


Rsiv) = 




P. 


where the sum in the right hand side includes s = 0 in the case when /xq = 0 
is an eigenvalue of B (equivalently, in the case when 2 < m + n). 

Dehne 


RBiv) ■= PBiil) 


fJ^k-fl 


-Pk = 


E^ 


Pk- 


Then, for r > 2, 


p^[RBiri)rYRB{v)Pk = -^PY[RB{v)rVPk 

fj^k-fi 

= . i ^2 E(i?B(b)r)^-'Pfcr(i?B(b)r)’-*Pfc + -^iRB{v)rrPk. 

\t^k kj) /ife ry 

The above representation easily follows from the following simple observation: 
let a := —P and b := RbMT. Then 

(a + by =a{a + bf-^ + b{a + bf-^ 

=a(a + by-^ + ba{a + by-^ + b^ia + by-^ 

=a{a + by~^ + ba{a + by~‘^ + b^a{a + by~^ + b^{a + by~^ 

r 

= ... = ^6«-ia(a + 6)’-® + 6’'. 
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As a result, 

Pk^Sk{T)p, = - <f ]_ j2(^Biv)Tr-^PkriRBivWV-^Pk 

r>2 Jjk V) ^^2 

+ ^—{RB{v)^yPk dll 
fj^k-v 

(2.6) 

Let Pfc = X) di®9i, where {OiY G A^} are ortho normal eigenvectors corre- 
leAk 

sponding to the eigenvalue Hk- Therefore, for any y G 

{RBiv)ry-^Pkr{RBiv)ry-^Pky = ^ {RBiv)ry-^9i ® 9iriRBiv)ry-^Pky 

iSAfc 

= ^ {r{RBiv)ry-^Pky,9i){RBiv)ry-^RBiv)r9i 
leAk 

(2.7) 

Since I {T{RBiv)Ty-^Pky,9i) \ < ||rr-«+i||ii-B(?7)ir-1l2/ll, we get 

E| {r{RBiy)ry-^Pky,0i) pi(||r|| < 5„,„) < ||yf. 

Also, for any x G ]R’"+", we have to bound 

E|((i?B(?7)r)*-"i?B(?7)m,x)|'l(||r|| <5„,„). (2.8) 

In what follows, we need some additional notations. Let Af,..., ^ 

Af{0,T'^Im) be the i.i.d. columns of X and (A[)',..., (A)))' ^ Af{0,T'^In) 
be its i.i.d. rows (here Im and /„ are m x m and n x n identity matri¬ 
ces). For j = l,...,n, define the vector A| = ((A|)',0)' G ]R’”+”, rep¬ 
resenting the (to- f j)-th column of matrix F. Similarly, for i = 1,...,to, 

XI = (0, {Xiyy G represents the i-th row of F. With these notations, 

the following representations of F holds 

n n 

r = E«?®A + EA®«,". 

m m 

F = ^ A[ 0 e™+” + ^ 6™+” 0 Xy 

and, moreover, 

n m n m 

E c: ■ ® = E ® er+”, E E ® <11 = E ® n- 

j—l i—l j—^ i—1 
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Therefore, 


n 

+ E {iRB{v)rr-^RB{v)X],x) =: h{x) + h{x), 

1=1 


and we get 


E 


{Rb(ji)^) 


^-^RB{v)Teux) 


l(||r|| < 5n,m) 

<2E(|/i(x)p + |/ 2 (x)p)i(||r|| 




(2.9) 


Observe that the random variable (Rb(v)^)^ ^Rb(v) is a function of 
{PtX^, t ^ k,j = 1,..., n}. Indeed, since Rb{ii) is a linear combination of op¬ 
erators Pt,t ^ k, it is easy to see that {RB{r])r)^~^RB{r]) can be represented 
as a linear combination of operators 


(Pt,rPij(Pt,rPt3)... J 

with tj ^ k and with non-random complex coefficients. On the other hand, 


Rtk^Ptk+i — E Ptk e 

1=1 


m+n 

m+j 




1=1 


I P. 

*k+i ’ 


These two facts imply that {RB{'r])T)^~'^RB{ri) is a function of {PtXj,t ^ 
fc, j = 1,..., n}. Similarly, it is also a function of {P^X’’, t ^ fc, 1 = 1,..., m}. 

It is easy to see that random variables {P^XJ, j = 1,..., n} and {PjXJ, j 
I,... ,n,t ^ fc} are independent. Since they are mean zero normal random 
variables and Xj,j = 1,... ,n are independent, it is enough to check that, 
for all } = 1,... ,n, t ^ fc, PkX'^ and PtXj are uncorrelated. To this end, 
observe that 


E(PfeX} O PtXJ) =PfeE(X} O X})Pt 


pr \ 

( Im 0 \ 

/ TJUU 

I ‘ 

TJUV V 
* 1 

pvv J 

O 

o 

V pr 

pvv J 


_1 / P““P““ puupuv \ / 0 0 \ 

“4V P™P““ pvupuv j - 0 0 J’ 


where we used orthogonality relationships (IE3. Quite similarly, one can 
prove independence of {PfcX}, 1 = 1,..., mj and {PtXf, i = 1,... ,m,t ^ fc}. 

We will now provide an upper bound on E|/i(a;)pl(||r|| < To 

this end, define 


ujjix) = (^{RBir])Ty 2Ps(ry)e™+",a;^ , j = l,...,n 
(x) + iujf'' (x) e C. 
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Let Ii{x) = K^^\x) +iK^'^\x) G C. Then, conditionally on {-PiX| :t^k,j = 
the random vector (k^^^(x), k^^^Ix)) has the same distribution as 
mean zero Gaussian random vector in K.^ with covariance, 


y (x) j ,ki,k2 = l,2 

(to check the last claim, it is enough to compute conditional covariance 
of K^^)(a;)) given {-PtX| : t ^ k,j = l,...,n} using the fact that 

{RB{ri)ry~‘^RB{ri) is a function of {PtXyt y k,j = 1,... ,n}). Therefore, 


E{\hix)\YtX^:tyk,j = l,...,n 
=E ( {x)y + {x)y I PtX^i : t y k,j = l,... ,n 


n 2 ^ 


i=i 


i=i 


Furthermore, 


n n 

^T‘^\ujyx)\’^ = |a;j(a;)|2 

i=i i=i 

n 2 

=r^Y.\{^B{v){TRB{v)y-^x,eZX;) 

i=i 

=r^ (^RB{r]){TRB{ri)y~'^x,RB{v){^RB{v)^y~'^x^ 

<r2||i?B(^)f(-i)||rf(-2)||^l|2_ 

Under the assumption 5n,m < the following inclusion holds: 


i " / 2 \ 2(s-i) I 

J2ryu;yx)f < (^-j Sl^y^^^Wxf \ =: G 


Therefore, 


E|/i(cr)|2l(||r|| < 6n,^) < E|/i(x)plG = EE |/i(x)| 


Ptxytyk,j = i,...., 


1=1 


PtXyt y k,j = 1,... ,n lG<r 


9k 


2{s-l) 


A similar bound holds also for E|/ 2 (x)pl(||r|| < 5n,m) '■ 

E|/ 2 (x)pl(||r|| < f 

\9kJ 


( 2 . 10 ) 

( 2 . 11 ) 
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For the proof, it is enough to observe that 

n 

h{x) = E {eZXlA){{RB{ri)TY-^RB{ri)Xlx) 
= {{Rb{v)TY-^Rb{v) (e;=i ® C+;) 9ux^ 

= {{Rb{ii)TY-^Rb{v) e^” 0 Xl^ 0,,x'^ 


= E {XYei){{RB{v)rY-^RB{v)eT^Yx) 

i=l 

and to repeat the previous conditioning argument (this time, given {PtXl 
t fc, i = 1,..., m}). 

Combining bounds (I2.1()ll . (I2.11|) and (12.9|) . we get 


E 


{RbM^Y RB{r])r 9 i,x) l(||r|| < < 2r 


gk 


2{s-l) 




Then, it follows that 


< 


V.{T{RB{g)TY-^Pky,9i){{RB{g)TY-^RB{g)T9ux) l(||r|| < ^„,™) 

/ 9 \ 1/2 

(E|(r(i?s(r/)rr-*Pfcy,0i)| l(l|r|| <^n,m)) 

X fE| \ 

25 , 


<^2? 


{Rb{v)TY-^RbYi)T9ux) l(||r|| < 5n,m) 

r—1 


1/2 


gk 


INIIlyll, 


which, taking into account (EH), implies that 


K({RBYl)TY-^PkT{RBYl)TY-^Pky,x)im\ < 5u,m) 

Ikllllyll 


Since {RBYi)^YRk = {RBYi)^Y ^RB(,g)^Pk, it can be proved by a similar 
argument that 


E((i?B(7;)r)’'Pfe2/,x)l(||r|| 


) 




INIIIi/ll- 


Therefore, substituting the above bounds in (12.61) and taking into account 
that l/ifc — i/l = ^, ?7 G 7 /c and that the length of the contour of integration 
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7 fc is equal to we get 

E{PitSk{T)Pky,x) l(||r|| < Sr,,m) < Y. 


r>2 


rgk 

2 


V2i 


VkT 


2Sr, 


9k 


IklllM 


=—v^z/fcT y^i 

9^ ^2 


25r, 


Qk 


r—1 


^ ^n.m II 

<7 VkT^:^\\x 

9k 


where we also used the condition — 7 / 2 )^ implying that 

1 — 7 / 2 . Clearly, this implies that 


2(5„ 


9k 


< 


EPtSk{v)Pk i(||r|| < vkT^ 

9k 


<_ 


VkT^/m V n 


9l 


Furthermore, the same bound, obviously, holds for 
||E(Pfe5fc(r)Pfc^y,cr)l(||r|| < = ||E(Pfe^5fc(r)Pfecr,y)l(||r|| < 

and, by similar arguments, it can be demonstrated that it also holds for 

EP^Sk{T)P^\\im<Sn,m) 

(the only different term in this case is Rb{ii), but, since t ^ k} 

are outside of the circle jk, it simply leads to §_^^{RB{'r])^YRB{il)dri = 0 ). 

It remains to observe that 

|e {pYSk(T)p, + PkSk(T)PY + Pi-Sk{r)PY) i(l|r|| > 

<E||Pfc^5fe(r)Pfc + Pfc^fe(r)Pfc^ + Pfc^5fc(r)Pfc^||i(||r|| > Sn,m) 

<E||5fc(r)||i(||r|| 

<(E||5fc(r)f)i/2pi/2(||r|| > 

PlV-'^dini > ‘ 

gk 


<]E1/2 


9l 


\fm\Fn 

and to substitute the above bounds to identity (EH) to get that 
EPfc-Pfc-PfeE5fe(r)Pfe <„ 
which implies the claim of the theorem. 


9l 


9 I 


□ 


Proof of Theorem [TH By a simple computation (see Lemma 8 and the deriva¬ 
tion of ( 6 . 6 ) in [5]), the following identity holds 


(fiik \/l + bkdi ^., x') — 


Pk{x) 


\/l + + Pk{x) 

VT+H 


\/l + bk + Pk{x) (\/i + bk + Pkix) + -pi + bk) 


Pk{9if. ) {di^ , x) 


( 2 . 12 ) 
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where pk{x) := {{Pk — (1 + bk)Pk)di^,x'),x G In what follows, assume 

that I lx 11 = 1. By the bounds of theorems 11.11 and 11.21 with probability at 
least 1 — : 


I ^ M / n /T^mV n + T\/t 

\pk{x)\ < ;- 

Pk ^ Pk 



The assumption ]E||X|| < (1 — 7 )^ implies that Ty/mVn < gk- Therefore, if 
t satisfies the assumption for a sufficiently small constant Cj > 0 , 

then we have |/ 9 /c(x)| < 7 / 2 . By the assumption that 1 + > 7 , this implies 

that 1 + bk + Pk{x) > 7 / 2 . Thus, it easily follows from identity (j2.12l) that 
with probability at least 1 — 2 e“* 


{Pik V^l + bkOi^. , 




T'/t / T\/m V n + r-\/t 


Pk V 


Pk 


Ti¬ 


lt remains to show that the same bound holds when 


case, we simply have that 


Ty/t 

9k 


{dik V^l + bkOif. , x^ 

which implies the bound of the theorem. 


T ll^ifell + (1 + bk)\\0ii,\\ <‘2 —, 

Pk 


> c-y. In this 


tH 


□ 


Proof of Corollary 11.51 By a simple algebra, 

\bk - bk\ = - {1 + bk) < \/l + bk{9l^ - \/l + bk9i ^, ) 

+ \/l + bk{9i^ — -\/l + bk9i^,9i^) + (§1^ — \/l + bk9i^,9f^ — \/l + bkb^P) 
Corollary II.51 implies that with probability at least 1 — e“* 

T^/t r t-v/to V n + T\/t 


\/l + bk{9lj^ - \/l + bk9i^,9i^') 


< 


7 - 
Pk L 


1 


Pk 


where we also used the fact that 1 + G [0,1]. A similar bound holds with 
the same probability for 


\/l + bk{9i^ - \/l + bk9i^,9iP) 
To control the remaining term 


{^Ik ~ \/l + bk9i^,9l^ - \/l + bk9iP) , 

note that 9] and 9f are independent. Thus, applying the bound of Theo¬ 


rem 11.31 conditionally on 9~ , we get that with probability at least 1 — e 


( 9}j^ — Vl + bk9i ^, 9j^ — \/l + bk 9iP) 
It remains to observe that 




r Ty/m V n -(- ry/i 


Pk L 


Pk 


+1 


\\9l-y/lTh9,P\. 


\\9l - + < 2 
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to complete the proof of bound (|1.9I) . 

Assume that ||a;|| < 1. Recall that under the assumptions of the corol¬ 
lary, T\/m V n gk and, if for a sufficiently small constant c-y^ 

then bound (jl.9|l implies that \hk — bk\ < 7/4 (on the event of probability at 
least 1 —e“‘). Since 1 + bk > 7 / 2 , on the same event we also have l-b&fc > 7/4 

'•f 9^ 

implying that 9^ = , . Therefore, 


<„ 


( 0 /, - ^/l + bue,^,x) 


- \/l + , 2^) 

-b 


(2.13) 


1/1 + — \/l -b 


The first term in the right hand side can be bounded using Theorem [T3] and, 
for the second term. 




\/l + bk -b \/l -b 


< 


7 


so bound (HD can be used. Substituting these bounds in (j2.13|) . we derive 
(jl.lO|) in the case when < c^. 

In the opposite case, when > c^,, we have 


Therefore, 

which implies ( 11.1011 in this case. 


1 2 

— / -+ 1 ^ —;= + 1 - 

\/l + V ^ 


< 


-Vi 


'7 - > 

gk 


□ 
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